Introduction


The data relates to a phone call marketing campaign directed by a banking institution to predict whether or not a client will participate in a term deposit. Term deposits are considered to be a more secure investment opportunity, considered to be somewhat protected from market fluctuations, as opposed to stocks. Generally, a client will invest a specific sum for a set amount of time (e.g. 5 months) with a predetermined interest rate. The investment is then pulled after the time has passed or prior, typically with a cost penalty.

The dataset contains all contact attempts to the clients, which can be multiple times to determine whether or not the client will subscribe to a term deposit (campaign). In total, there are 41,188 total observations. For social and economic context attributes, keep in mind that the indicators are assumed to be pulled from the general demographic, and is hence normalizing the data.

MARKETING CAMPAIGNS

DATASET

https://archive.ics.uci.edu/ml/datasets/bank%20marketing

Variables

Input variables:

bank client data:

1 - age (numeric) 2 - job : type of job (categorical: ‘admin.’,‘blue-collar’,‘entrepreneur’,‘housemaid’,‘management’,‘retired’,‘self-employed’,‘services’,‘student’,‘technician’,‘unemployed’,‘unknown’) 3 - marital : marital status (categorical: ‘divorced’,‘married’,‘single’,‘unknown’; note: ‘divorced’ means divorced or widowed) 4 - education (categorical: ‘basic.4y’,‘basic.6y’,‘basic.9y’,‘high.school’,‘illiterate’,‘professional.course’,‘university.degree’,‘unknown’) 5 - default: has credit in default? (categorical: ‘no’,‘yes’,‘unknown’) 6 - housing: has housing loan? (categorical: ‘no’,‘yes’,‘unknown’) 7 - loan: has personal loan? (categorical: ‘no’,‘yes’,‘unknown’)

other attributes:

12 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact) 13 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted) 14 - previous: number of contacts performed before this campaign and for this client (numeric) 15 - poutcome: outcome of the previous marketing campaign (categorical: ‘failure’,‘nonexistent’,‘success’)

social and economic context attributes

16 - emp.var.rate: employment variation rate - quarterly indicator (numeric) 17 - cons.price.idx: consumer price index - monthly indicator (numeric) 18 - cons.conf.idx: consumer confidence index - monthly indicator (numeric) 19 - euribor3m: euribor 3 month rate - daily indicator (numeric) 20 - nr.employed: number of employees - quarterly indicator (numeric)

Output variable (desired target):

21 - y - has the client subscribed a term deposit? (binary: ‘yes’,‘no’)

Data Cleaning

Cleaning data, new variable generation, and/or handling missing logistics

Cleaning Data

$ age : int $ job : Factor $ marital : Factor $ education : Factor $ default : Factor $ housing : Factor $ loan : Factor $ contact : Factor $ month : Factor $ day_of_week : Factor $ duration : int $ campaign : int $ pdays : int $ previous : int $ poutcome : Factor $ emp.var.rate : Factor $ cons.price.idx: Factor $ cons.conf.idx : Factor $ euribor3m : Factor $ nr.employed : Factor $ y : Factor

Converting columns to correct types

Convert required data from vector to int.

#EDA - Determining significant columns ****** 1. Pdays - a significant number of the observations do not have prior contacts, hence they do not have days after the first contacts filed (pdays not equal to 999). 3.68% of the observations (1,151 records) have had more than one contact. This appears to be an insignificant column.

  1. Previous - while ~14% of the clients were contacted prior to the current campaign, 11% were contacted only 1 prior, leaving only 3% that were contacted more than once, prior.

  2. Campaigns - potentially significant

  3. Duration - This should be removed considering that the dataset states: for a realistic predictive model, this factor should not be considered.

Data Manipulation

New Variable Generations

Handling of Missing Logistics

There are no missing values -> this is due to the fact that missing values are designated as unknown.

Columns with missing data - Job 330 (type of job) - Marital 80 - Eduation 1731 (highest education received) - Default 8597 (whether or not they have credit in default - failure to pay) - Housing 990 (has a housing loan or not) - Loan 990 (personal loan or not)

Total number of observations: 41,188 Total number of observations with at least one missing value: 10,700

~26% of the observations have at least one missing data record.

The summary statistics do not initally appear to have a significant skew from the full data. A concern with the data is that all columns with missing data points are categorical. Majority of the rows only have one missing, with less than 20% having more than one missing portion of the record.

Options: Ignore observations - not ideal Ignore variable - TBD in analysis Develop model to predict missing values Treat missing data as just another category - Recommended

Data Plots

Plotting for trend determination

FULL DATASET: Visual for numeric, color categorized by whether or not the client participated in term deposit. MISSING DATA: Visual for numeric, color categorized by whether or not the client participated in term deposit.

Visual for categorical, color categorized by whether or not the client participated in term deposit.

DATA SPLIT


There are way more No’s than Yes for the response variable. In order to balance out the dataset, after the training split, the data is downsampled to prevent the entire minority group from being entirely excluded from the test set. The downsampled training set is then removed from the full data set to result in the test set.

Ojective 1: Simple Logitistic Regression Model


The primary focus of objective 1 is to ensure that interpretability is preserved, while attempting to create an accurace model that predicts efficiently.

Original Logistic Regression

##  [1] "age"            "job"            "marital"        "education"     
##  [5] "default"        "housing"        "loan"           "contact"       
##  [9] "month"          "day_of_week"    "duration"       "campaign"      
## [13] "pdays"          "previous"       "poutcome"       "emp.var.rate"  
## [17] "cons.price.idx" "cons.conf.idx"  "euribor3m"      "nr.employed"   
## [21] "y"              "pdays_0"        "ID"
##       age                 job          marital                   education  
##  Min.   :19.00   admin.     :621   divorced: 256   university.degree  :756  
##  1st Qu.:32.00   blue-collar:452   married :1407   high.school        :511  
##  Median :38.00   technician :378   single  : 696   professional.course:326  
##  Mean   :40.52   services   :190   unknown :   5   basic.9y           :302  
##  3rd Qu.:48.00   management :168                   basic.4y           :230  
##  Max.   :89.00   retired    :141                   unknown            :124  
##                  (Other)    :414                   (Other)            :115  
##     default        housing          loan           contact         month    
##  no     :1996   no     :1078   no     :1980   cellular :1709   may    :639  
##  unknown: 368   unknown:  54   unknown:  54   telephone: 655   jul    :392  
##  yes    :   0   yes    :1232   yes    : 330                    aug    :334  
##                                                                jun    :294  
##                                                                nov    :235  
##                                                                apr    :200  
##                                                                (Other):270  
##  day_of_week    duration         campaign         previous     
##  fri:420     Min.   :   5.0   Min.   : 1.000   Min.   :0.0000  
##  mon:447     1st Qu.: 146.8   1st Qu.: 1.000   1st Qu.:0.0000  
##  thu:508     Median : 265.0   Median : 2.000   Median :0.0000  
##  tue:480     Mean   : 382.5   Mean   : 2.267   Mean   :0.3249  
##  wed:509     3rd Qu.: 511.0   3rd Qu.: 3.000   3rd Qu.:0.0000  
##              Max.   :4199.0   Max.   :23.000   Max.   :6.0000  
##                                                                
##         poutcome      emp.var.rate.V1    cons.price.idx.V1   cons.conf.idx.V1  
##  failure    : 284   Min.   : 1.000000   Min.   : 1.000000   Min.   : 1.000000  
##  nonexistent:1836   1st Qu.: 5.000000   1st Qu.: 9.000000   1st Qu.:10.000000  
##  success    : 244   Median : 6.000000   Median :14.000000   Median :18.000000  
##                     Mean   : 6.861675   Mean   :14.280457   Mean   :15.228849  
##                     3rd Qu.:10.000000   3rd Qu.:19.000000   3rd Qu.:20.000000  
##                     Max.   :10.000000   Max.   :26.000000   Max.   :26.000000  
##                                                                                
##     euribor3m.V1       nr.employed.V1      y           pdays_0       
##  Min.   :  1.00000   Min.   : 1.000000   no :1182   Min.   : 0.0000  
##  1st Qu.:198.00000   1st Qu.: 6.000000   yes:1182   1st Qu.: 0.0000  
##  Median :268.00000   Median : 9.000000              Median : 0.0000  
##  Mean   :226.07953   Mean   : 7.759729              Mean   : 0.6882  
##  3rd Qu.:304.00000   3rd Qu.:11.000000              3rd Qu.: 0.0000  
##  Max.   :315.00000   Max.   :11.000000              Max.   :27.0000  
## 
##          y
## housing      no   yes
##   no      16596  2026
##   unknown   883   107
##   yes     19069  2507
## 
## Call:
## glm(formula = y ~ ., family = "binomial", data = trainingsData2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -5.7857  -0.3569  -0.0356   0.4356   2.5966  
## 
## Coefficients: (1 not defined because of singularities)
##                                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   2.015e+00  8.093e-01   2.490  0.01278 *  
## age                           1.298e-03  8.261e-03   0.157  0.87517    
## jobblue-collar               -4.388e-01  2.601e-01  -1.687  0.09157 .  
## jobentrepreneur              -1.439e-01  3.935e-01  -0.366  0.71452    
## jobhousemaid                 -3.949e-02  4.637e-01  -0.085  0.93212    
## jobmanagement                -5.238e-02  2.918e-01  -0.179  0.85756    
## jobretired                    6.214e-01  3.784e-01   1.642  0.10056    
## jobself-employed             -5.731e-01  3.849e-01  -1.489  0.13652    
## jobservices                   8.818e-03  2.920e-01   0.030  0.97591    
## jobstudent                    3.180e-01  3.754e-01   0.847  0.39690    
## jobtechnician                 1.197e-01  2.515e-01   0.476  0.63401    
## jobunemployed                 3.631e-01  4.195e-01   0.866  0.38675    
## jobunknown                    7.299e-01  7.593e-01   0.961  0.33642    
## maritalmarried                7.402e-02  2.238e-01   0.331  0.74083    
## maritalsingle                 2.718e-01  2.601e-01   1.045  0.29602    
## maritalunknown               -6.659e-01  1.153e+00  -0.578  0.56344    
## educationbasic.6y             1.946e-01  4.114e-01   0.473  0.63613    
## educationbasic.9y            -4.607e-02  3.100e-01  -0.149  0.88184    
## educationhigh.school         -7.125e-02  3.132e-01  -0.228  0.82003    
## educationilliterate           7.542e+00  3.247e+02   0.023  0.98147    
## educationprofessional.course  1.135e-01  3.395e-01   0.334  0.73806    
## educationuniversity.degree    4.729e-01  3.093e-01   1.529  0.12631    
## educationunknown              2.401e-01  4.001e-01   0.600  0.54850    
## defaultunknown               -1.502e-01  2.193e-01  -0.685  0.49343    
## housingunknown               -1.346e-01  4.543e-01  -0.296  0.76708    
## housingyes                    1.648e-01  1.400e-01   1.177  0.23911    
## loanunknown                          NA         NA      NA       NA    
## loanyes                      -1.709e-01  1.957e-01  -0.873  0.38257    
## contacttelephone             -1.757e-01  2.457e-01  -0.715  0.47458    
## monthaug                     -6.475e-01  3.847e-01  -1.683  0.09239 .  
## monthdec                      8.580e-01  1.121e+00   0.765  0.44411    
## monthjul                     -4.859e-01  3.075e-01  -1.580  0.11406    
## monthjun                      1.156e-01  3.060e-01   0.378  0.70570    
## monthmar                      4.244e-01  4.117e-01   1.031  0.30266    
## monthmay                     -1.808e+00  2.533e-01  -7.138 9.46e-13 ***
## monthnov                     -1.112e+00  3.666e-01  -3.033  0.00242 ** 
## monthoct                     -2.604e-01  4.301e-01  -0.605  0.54496    
## monthsep                      1.943e-01  7.115e-01   0.273  0.78480    
## day_of_weekmon               -3.061e-01  2.270e-01  -1.349  0.17742    
## day_of_weekthu               -2.650e-01  2.244e-01  -1.181  0.23770    
## day_of_weektue               -3.962e-01  2.234e-01  -1.774  0.07609 .  
## day_of_weekwed               -1.418e-01  2.232e-01  -0.635  0.52535    
## duration                      7.914e-03  3.860e-04  20.504  < 2e-16 ***
## campaign                     -4.655e-02  4.054e-02  -1.148  0.25083    
## previous                      3.666e-02  2.278e-01   0.161  0.87215    
## poutcomenonexistent           4.321e-01  3.327e-01   1.299  0.19403    
## poutcomesuccess               1.861e+00  3.856e-01   4.826 1.40e-06 ***
## emp.var.rate                 -3.936e-02  4.868e-02  -0.809  0.41875    
## cons.price.idx               -9.007e-02  1.760e-02  -5.118 3.08e-07 ***
## cons.conf.idx                 8.587e-03  2.214e-02   0.388  0.69813    
## euribor3m                     9.086e-04  2.672e-03   0.340  0.73388    
## nr.employed                  -4.321e-01  7.497e-02  -5.764 8.23e-09 ***
## pdays_0                      -2.760e-02  4.743e-02  -0.582  0.56060    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3277.2  on 2363  degrees of freedom
## Residual deviance: 1450.9  on 2312  degrees of freedom
## AIC: 1554.9
## 
## Number of Fisher Scoring iterations: 11
## 
## Call:
## glm(formula = y ~ job + education + contact + month + day_of_week + 
##     campaign + poutcome + cons.price.idx + nr.employed, family = "binomial", 
##     data = trainingsData2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7799  -0.8487  -0.2234   0.8065   2.0740  
## 
## Coefficients:
##                               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    2.47260    0.36306   6.810 9.73e-12 ***
## jobblue-collar                -0.03023    0.18442  -0.164 0.869797    
## jobentrepreneur                0.32460    0.26398   1.230 0.218830    
## jobhousemaid                  -0.01002    0.33984  -0.029 0.976476    
## jobmanagement                  0.19740    0.21032   0.939 0.347954    
## jobretired                     0.65718    0.27539   2.386 0.017017 *  
## jobself-employed               0.02237    0.27994   0.080 0.936304    
## jobservices                    0.14278    0.20694   0.690 0.490240    
## jobstudent                     0.37249    0.28968   1.286 0.198491    
## jobtechnician                  0.15462    0.18105   0.854 0.393097    
## jobunemployed                  0.19178    0.32834   0.584 0.559153    
## jobunknown                     0.45448    0.56062   0.811 0.417550    
## educationbasic.6y              0.27485    0.28290   0.972 0.331269    
## educationbasic.9y              0.26757    0.22155   1.208 0.227152    
## educationhigh.school           0.29699    0.22366   1.328 0.184237    
## educationilliterate           11.77535  324.74389   0.036 0.971075    
## educationprofessional.course   0.10327    0.24897   0.415 0.678297    
## educationuniversity.degree     0.55574    0.22235   2.499 0.012440 *  
## educationunknown               0.30765    0.29471   1.044 0.296526    
## contacttelephone              -0.56018    0.15255  -3.672 0.000241 ***
## monthaug                      -0.56257    0.22148  -2.540 0.011084 *  
## monthdec                       1.40186    1.06442   1.317 0.187835    
## monthjul                      -0.13876    0.22235  -0.624 0.532586    
## monthjun                       0.28091    0.24770   1.134 0.256766    
## monthmar                       0.39582    0.37897   1.044 0.296266    
## monthmay                      -0.96424    0.19709  -4.892 9.96e-07 ***
## monthnov                      -0.64135    0.22844  -2.808 0.004992 ** 
## monthoct                      -0.03536    0.35156  -0.101 0.919881    
## monthsep                       0.75049    0.63559   1.181 0.237689    
## day_of_weekmon                -0.33495    0.16360  -2.047 0.040627 *  
## day_of_weekthu                -0.09909    0.15932  -0.622 0.533975    
## day_of_weektue                -0.22911    0.16024  -1.430 0.152768    
## day_of_weekwed                -0.12754    0.15840  -0.805 0.420729    
## campaign                      -0.04732    0.02489  -1.901 0.057276 .  
## poutcomenonexistent            0.49247    0.16084   3.062 0.002199 ** 
## poutcomesuccess                1.96530    0.31788   6.183 6.31e-10 ***
## cons.price.idx                -0.03878    0.01195  -3.246 0.001172 ** 
## nr.employed                   -0.26161    0.02568 -10.187  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3277.2  on 2363  degrees of freedom
## Residual deviance: 2483.1  on 2326  degrees of freedom
## AIC: 2559.1
## 
## Number of Fisher Scoring iterations: 11

Stepwise Feature Selection

## 
## Call:
## glm(formula = y ~ education + month + poutcome + emp.var.rate + 
##     cons.price.idx + nr.employed, family = "binomial", data = trainingsData2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7280  -0.8653  -0.2898   0.7726   1.9281  
## 
## Coefficients:
##                               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    2.93862    0.29190  10.067  < 2e-16 ***
## educationbasic.6y              0.19722    0.27239   0.724  0.46904    
## educationbasic.9y              0.17020    0.20988   0.811  0.41740    
## educationhigh.school           0.24921    0.19243   1.295  0.19528    
## educationilliterate           11.94159  324.74377   0.037  0.97067    
## educationprofessional.course   0.10790    0.21235   0.508  0.61138    
## educationuniversity.degree     0.49853    0.18518   2.692  0.00710 ** 
## educationunknown               0.30811    0.27294   1.129  0.25896    
## monthaug                      -0.39956    0.23522  -1.699  0.08938 .  
## monthdec                       1.31327    1.05090   1.250  0.21142    
## monthjul                       0.10156    0.23137   0.439  0.66070    
## monthjun                       0.16464    0.24616   0.669  0.50361    
## monthmar                       0.28862    0.37423   0.771  0.44057    
## monthmay                      -1.12716    0.19059  -5.914 3.34e-09 ***
## monthnov                      -0.86736    0.24104  -3.598  0.00032 ***
## monthoct                      -0.09388    0.33879  -0.277  0.78170    
## monthsep                       0.69254    0.63198   1.096  0.27316    
## poutcomenonexistent            0.43500    0.15821   2.750  0.00597 ** 
## poutcomesuccess                1.94412    0.31647   6.143 8.09e-10 ***
## emp.var.rate                  -0.05198    0.02922  -1.779  0.07526 .  
## cons.price.idx                -0.05506    0.01097  -5.017 5.25e-07 ***
## nr.employed                   -0.26909    0.02694  -9.988  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3277.2  on 2363  degrees of freedom
## Residual deviance: 2511.9  on 2342  degrees of freedom
## AIC: 2555.9
## 
## Number of Fisher Scoring iterations: 11

Forward Feature Selection

## 
## Call:
## glm(formula = y ~ age + job + month + day_of_week + cons.price.idx + 
##     nr.employed + poutcome, family = "binomial", data = trainingsData2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7229  -0.8409  -0.2156   0.7854   1.9776  
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          3.621869   0.348873  10.382  < 2e-16 ***
## age                 -0.009759   0.005424  -1.799  0.07199 .  
## jobblue-collar      -0.250931   0.147475  -1.702  0.08885 .  
## jobentrepreneur      0.314275   0.258223   1.217  0.22358    
## jobhousemaid        -0.280920   0.328706  -0.855  0.39276    
## jobmanagement        0.260030   0.207874   1.251  0.21097    
## jobretired           0.542505   0.289440   1.874  0.06088 .  
## jobself-employed    -0.011980   0.274010  -0.044  0.96513    
## jobservices         -0.022122   0.192990  -0.115  0.90874    
## jobstudent           0.106698   0.294131   0.363  0.71679    
## jobtechnician       -0.048868   0.156202  -0.313  0.75439    
## jobunemployed        0.041558   0.320659   0.130  0.89688    
## jobunknown           0.352375   0.540254   0.652  0.51425    
## monthaug            -0.514533   0.221117  -2.327  0.01997 *  
## monthdec             1.212915   1.053470   1.151  0.24959    
## monthjul            -0.054247   0.218595  -0.248  0.80401    
## monthjun             0.086295   0.239963   0.360  0.71913    
## monthmar             0.278650   0.375858   0.741  0.45847    
## monthmay            -1.177753   0.190793  -6.173 6.70e-10 ***
## monthnov            -0.648079   0.228295  -2.839  0.00453 ** 
## monthoct            -0.204376   0.346627  -0.590  0.55545    
## monthsep             0.628358   0.633112   0.992  0.32096    
## day_of_weekmon      -0.318913   0.162564  -1.962  0.04979 *  
## day_of_weekthu      -0.092060   0.157822  -0.583  0.55968    
## day_of_weektue      -0.224123   0.159016  -1.409  0.15871    
## day_of_weekwed      -0.127357   0.156469  -0.814  0.41568    
## cons.price.idx      -0.060041   0.010648  -5.639 1.71e-08 ***
## nr.employed         -0.283311   0.025169 -11.256  < 2e-16 ***
## poutcomenonexistent  0.452561   0.160639   2.817  0.00484 ** 
## poutcomesuccess      1.995275   0.318730   6.260 3.85e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3277.2  on 2363  degrees of freedom
## Residual deviance: 2508.2  on 2334  degrees of freedom
## AIC: 2568.2
## 
## Number of Fisher Scoring iterations: 5

Backwards Feature Selection

## 
## Call:
## glm(formula = y ~ education + month + poutcome + emp.var.rate + 
##     cons.price.idx + nr.employed, family = "binomial", data = trainingsData2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7280  -0.8653  -0.2898   0.7726   1.9281  
## 
## Coefficients:
##                               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    2.93862    0.29190  10.067  < 2e-16 ***
## educationbasic.6y              0.19722    0.27239   0.724  0.46904    
## educationbasic.9y              0.17020    0.20988   0.811  0.41740    
## educationhigh.school           0.24921    0.19243   1.295  0.19528    
## educationilliterate           11.94159  324.74377   0.037  0.97067    
## educationprofessional.course   0.10790    0.21235   0.508  0.61138    
## educationuniversity.degree     0.49853    0.18518   2.692  0.00710 ** 
## educationunknown               0.30811    0.27294   1.129  0.25896    
## monthaug                      -0.39956    0.23522  -1.699  0.08938 .  
## monthdec                       1.31327    1.05090   1.250  0.21142    
## monthjul                       0.10156    0.23137   0.439  0.66070    
## monthjun                       0.16464    0.24616   0.669  0.50361    
## monthmar                       0.28862    0.37423   0.771  0.44057    
## monthmay                      -1.12716    0.19059  -5.914 3.34e-09 ***
## monthnov                      -0.86736    0.24104  -3.598  0.00032 ***
## monthoct                      -0.09388    0.33879  -0.277  0.78170    
## monthsep                       0.69254    0.63198   1.096  0.27316    
## poutcomenonexistent            0.43500    0.15821   2.750  0.00597 ** 
## poutcomesuccess                1.94412    0.31647   6.143 8.09e-10 ***
## emp.var.rate                  -0.05198    0.02922  -1.779  0.07526 .  
## cons.price.idx                -0.05506    0.01097  -5.017 5.25e-07 ***
## nr.employed                   -0.26909    0.02694  -9.988  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3277.2  on 2363  degrees of freedom
## Residual deviance: 2511.9  on 2342  degrees of freedom
## AIC: 2555.9
## 
## Number of Fisher Scoring iterations: 11

LASSO Feature Selection

##  [1] "age"            "job"            "marital"        "education"     
##  [5] "default"        "housing"        "loan"           "contact"       
##  [9] "month"          "day_of_week"    "campaign"       "previous"      
## [13] "poutcome"       "emp.var.rate"   "cons.price.idx" "cons.conf.idx" 
## [17] "euribor3m"      "nr.employed"    "y"              "pdays_0"
##  [1] "age"            "job"            "marital"        "education"     
##  [5] "default"        "housing"        "loan"           "contact"       
##  [9] "month"          "day_of_week"    "duration"       "campaign"      
## [13] "previous"       "poutcome"       "emp.var.rate"   "cons.price.idx"
## [17] "cons.conf.idx"  "euribor3m"      "nr.employed"    "y"             
## [21] "pdays_0"

## 54 x 1 sparse Matrix of class "dgCMatrix"
##                                         1
## (Intercept)                   2.260117386
## (Intercept)                   .          
## age                           .          
## jobblue-collar                .          
## jobentrepreneur               .          
## jobhousemaid                  .          
## jobmanagement                 .          
## jobretired                    .          
## jobself-employed              .          
## jobservices                   .          
## jobstudent                    .          
## jobtechnician                 .          
## jobunemployed                 .          
## jobunknown                    .          
## maritalmarried                .          
## maritalsingle                 .          
## maritalunknown                .          
## educationbasic.6y             .          
## educationbasic.9y             .          
## educationhigh.school          .          
## educationilliterate           .          
## educationprofessional.course  .          
## educationuniversity.degree    .          
## educationunknown              .          
## defaultunknown                .          
## defaultyes                    .          
## housingunknown                .          
## housingyes                    .          
## loanunknown                   .          
## loanyes                       .          
## contacttelephone             -0.299815182
## monthaug                      .          
## monthdec                      .          
## monthjul                      .          
## monthjun                      .          
## monthmar                      .          
## monthmay                     -0.428366909
## monthnov                      .          
## monthoct                      .          
## monthsep                      .          
## day_of_weekmon                .          
## day_of_weekthu                .          
## day_of_weektue                .          
## day_of_weekwed                .          
## campaign                      .          
## previous                      .          
## poutcomenonexistent           .          
## poutcomesuccess               0.460833002
## emp.var.rate                  .          
## cons.price.idx                .          
## cons.conf.idx                 .          
## euribor3m                    -0.002943405
## nr.employed                  -0.179885021
## pdays_0                       .
## [1] "CV Error Rate:"
## [1] 0.2563452
## [1] "Penalty Value:"
## [1] 0.03797573
## 54 x 1 sparse Matrix of class "dgCMatrix"
##                                        s0
## (Intercept)                   2.260283194
## (Intercept)                   .          
## age                           .          
## jobblue-collar                .          
## jobentrepreneur               .          
## jobhousemaid                  .          
## jobmanagement                 .          
## jobretired                    .          
## jobself-employed              .          
## jobservices                   .          
## jobstudent                    .          
## jobtechnician                 .          
## jobunemployed                 .          
## jobunknown                    .          
## maritalmarried                .          
## maritalsingle                 .          
## maritalunknown                .          
## educationbasic.6y             .          
## educationbasic.9y             .          
## educationhigh.school          .          
## educationilliterate           .          
## educationprofessional.course  .          
## educationuniversity.degree    .          
## educationunknown              .          
## defaultunknown                .          
## defaultyes                    .          
## housingunknown                .          
## housingyes                    .          
## loanunknown                   .          
## loanyes                       .          
## contacttelephone             -0.299767682
## monthaug                      .          
## monthdec                      .          
## monthjul                      .          
## monthjun                      .          
## monthmar                      .          
## monthmay                     -0.428231546
## monthnov                      .          
## monthoct                      .          
## monthsep                      .          
## day_of_weekmon                .          
## day_of_weekthu                .          
## day_of_weektue                .          
## day_of_weekwed                .          
## campaign                      .          
## previous                      .          
## poutcomenonexistent           .          
## poutcomesuccess               0.460759481
## emp.var.rate                  .          
## cons.price.idx                .          
## cons.conf.idx                 .          
## euribor3m                    -0.002952097
## nr.employed                  -0.179658867
## pdays_0                       .
## 
## Call:
## glm(formula = y ~ job + marital + education + default + contact + 
##     month + day_of_week + campaign + poutcome + emp.var.rate + 
##     cons.conf.idx + cons.price.idx + nr.employed, family = "binomial", 
##     data = trainingsData2L)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7722  -0.8487  -0.2170   0.8041   2.0854  
## 
## Coefficients:
##                                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   2.449e+00  4.957e-01   4.939 7.84e-07 ***
## jobblue-collar               -2.410e-02  1.852e-01  -0.130  0.89644    
## jobentrepreneur               3.337e-01  2.667e-01   1.251  0.21093    
## jobhousemaid                 -7.971e-03  3.396e-01  -0.023  0.98127    
## jobmanagement                 1.913e-01  2.120e-01   0.902  0.36687    
## jobretired                    6.528e-01  2.774e-01   2.353  0.01862 *  
## jobself-employed             -3.082e-03  2.808e-01  -0.011  0.99124    
## jobservices                   1.459e-01  2.075e-01   0.703  0.48202    
## jobstudent                    3.690e-01  2.964e-01   1.245  0.21322    
## jobtechnician                 1.504e-01  1.813e-01   0.830  0.40672    
## jobunemployed                 1.974e-01  3.285e-01   0.601  0.54781    
## jobunknown                    1.983e-01  6.006e-01   0.330  0.74126    
## maritalmarried                8.689e-02  1.642e-01   0.529  0.59673    
## maritalsingle                 7.111e-02  1.799e-01   0.395  0.69270    
## maritalunknown                1.247e+00  1.007e+00   1.238  0.21567    
## educationbasic.6y             2.549e-01  2.842e-01   0.897  0.36983    
## educationbasic.9y             2.471e-01  2.237e-01   1.105  0.26920    
## educationhigh.school          2.783e-01  2.271e-01   1.226  0.22036    
## educationilliterate           1.173e+01  3.247e+02   0.036  0.97119    
## educationprofessional.course  8.871e-02  2.511e-01   0.353  0.72387    
## educationuniversity.degree    5.344e-01  2.263e-01   2.361  0.01821 *  
## educationunknown              2.903e-01  2.965e-01   0.979  0.32750    
## defaultunknown               -1.082e-01  1.443e-01  -0.750  0.45350    
## contacttelephone             -5.280e-01  1.792e-01  -2.947  0.00321 ** 
## monthaug                     -4.972e-01  2.977e-01  -1.670  0.09493 .  
## monthdec                      1.439e+00  1.072e+00   1.343  0.17929    
## monthjul                     -7.542e-02  2.413e-01  -0.313  0.75461    
## monthjun                      3.146e-01  2.530e-01   1.244  0.21365    
## monthmar                      4.018e-01  3.788e-01   1.061  0.28874    
## monthmay                     -9.460e-01  2.008e-01  -4.710 2.47e-06 ***
## monthnov                     -7.021e-01  3.052e-01  -2.300  0.02145 *  
## monthoct                     -9.108e-03  3.721e-01  -0.024  0.98047    
## monthsep                      7.869e-01  6.527e-01   1.206  0.22798    
## day_of_weekmon               -3.316e-01  1.639e-01  -2.023  0.04306 *  
## day_of_weekthu               -1.047e-01  1.600e-01  -0.654  0.51293    
## day_of_weektue               -2.331e-01  1.606e-01  -1.451  0.14665    
## day_of_weekwed               -1.413e-01  1.593e-01  -0.887  0.37509    
## campaign                     -4.627e-02  2.488e-02  -1.860  0.06293 .  
## poutcomenonexistent           4.916e-01  1.606e-01   3.060  0.00221 ** 
## poutcomesuccess               1.961e+00  3.176e-01   6.174 6.66e-10 ***
## emp.var.rate                 -1.889e-02  3.926e-02  -0.481  0.63036    
## cons.conf.idx                 3.209e-04  1.574e-02   0.020  0.98374    
## cons.price.idx               -3.765e-02  1.285e-02  -2.931  0.00338 ** 
## nr.employed                  -2.538e-01  3.327e-02  -7.630 2.34e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3277.2  on 2363  degrees of freedom
## Residual deviance: 2480.4  on 2320  degrees of freedom
## AIC: 2568.4
## 
## Number of Fisher Scoring iterations: 11
## 
## Call:
## glm(formula = y ~ ., family = "binomial", data = trainingsData2L)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.8537  -0.8488  -0.2177   0.8089   2.1028  
## 
## Coefficients: (1 not defined because of singularities)
##                                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    2.812371   0.634197   4.435 9.23e-06 ***
## age                           -0.007469   0.006196  -1.206 0.228009    
## jobblue-collar                -0.049024   0.186391  -0.263 0.792538    
## jobentrepreneur                0.319826   0.267001   1.198 0.230977    
## jobhousemaid                   0.025708   0.338817   0.076 0.939517    
## jobmanagement                  0.199103   0.213673   0.932 0.351434    
## jobretired                     0.794598   0.303920   2.614 0.008936 ** 
## jobself-employed              -0.004774   0.281532  -0.017 0.986471    
## jobservices                    0.123454   0.208364   0.592 0.553522    
## jobstudent                     0.282249   0.303763   0.929 0.352798    
## jobtechnician                  0.142048   0.181629   0.782 0.434170    
## jobunemployed                  0.189268   0.329348   0.575 0.565510    
## jobunknown                     0.229169   0.599743   0.382 0.702379    
## maritalmarried                 0.067850   0.165224   0.411 0.681327    
## maritalsingle                 -0.008685   0.191101  -0.045 0.963752    
## maritalunknown                 1.199728   1.005500   1.193 0.232804    
## educationbasic.6y              0.238947   0.285765   0.836 0.403062    
## educationbasic.9y              0.224050   0.225524   0.993 0.320484    
## educationhigh.school           0.233537   0.230729   1.012 0.311456    
## educationilliterate           11.779753 324.743901   0.036 0.971064    
## educationprofessional.course   0.055129   0.253175   0.218 0.827624    
## educationuniversity.degree     0.496164   0.229259   2.164 0.030448 *  
## educationunknown               0.289447   0.297665   0.972 0.330855    
## defaultunknown                -0.081148   0.145868  -0.556 0.577997    
## housingunknown                -0.192886   0.337801  -0.571 0.567996    
## housingyes                    -0.047815   0.101232  -0.472 0.636689    
## loanunknown                          NA         NA      NA       NA    
## loanyes                       -0.148812   0.143864  -1.034 0.300952    
## contacttelephone              -0.511631   0.180910  -2.828 0.004683 ** 
## monthaug                      -0.456237   0.306334  -1.489 0.136397    
## monthdec                       1.376588   1.079201   1.276 0.202111    
## monthjul                      -0.115978   0.243464  -0.476 0.633815    
## monthjun                       0.249806   0.258254   0.967 0.333400    
## monthmar                       0.391764   0.380320   1.030 0.302968    
## monthmay                      -0.947464   0.202235  -4.685 2.80e-06 ***
## monthnov                      -0.730936   0.310600  -2.353 0.018608 *  
## monthoct                       0.005452   0.373545   0.015 0.988355    
## monthsep                       0.846085   0.658780   1.284 0.199030    
## day_of_weekmon                -0.328607   0.164240  -2.001 0.045417 *  
## day_of_weekthu                -0.111421   0.160408  -0.695 0.487301    
## day_of_weektue                -0.233664   0.161139  -1.450 0.147037    
## day_of_weekwed                -0.140542   0.159695  -0.880 0.378822    
## campaign                      -0.045726   0.024964  -1.832 0.066999 .  
## previous                      -0.016070   0.204675  -0.079 0.937417    
## poutcomenonexistent            0.503337   0.287042   1.754 0.079511 .  
## poutcomesuccess                1.887159   0.362979   5.199 2.00e-07 ***
## emp.var.rate                  -0.021809   0.039639  -0.550 0.582190    
## cons.price.idx                -0.032389   0.014911  -2.172 0.029848 *  
## cons.conf.idx                  0.005024   0.016965   0.296 0.767134    
## euribor3m                     -0.001953   0.002271  -0.860 0.389699    
## nr.employed                   -0.207482   0.062826  -3.303 0.000958 ***
## pdays_0                        0.022405   0.043187   0.519 0.603909    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3277.2  on 2363  degrees of freedom
## Residual deviance: 2476.3  on 2313  degrees of freedom
## AIC: 2578.3
## 
## Number of Fisher Scoring iterations: 11

Roc Curves

According to the AUC of the ROC curves, Lasso appears to have an upper hand.

Comparing AIC and AUC

##       model      AIC   AUC
## 1  original 2559.079 0.784
## 2  stepwise 2555.858 0.933
## 3   forward 2568.152 0.782
## 4 backwarcd 2555.858 0.782
## 5     lasso 2568.400 0.778

According to the AUC, stepwise, forward, and Lasso have the highest area under the curve.

While the AIC of the three models are relatively close, stepwise is the lowest of the 5 models tested.

Ratio Statistics

##                  (Intercept)            educationbasic.6y 
##                 1.888976e+01                 1.218015e+00 
##            educationbasic.9y         educationhigh.school 
##                 1.185541e+00                 1.283016e+00 
##          educationilliterate educationprofessional.course 
##                 1.535204e+05                 1.113933e+00 
##   educationuniversity.degree             educationunknown 
##                 1.646304e+00                 1.360848e+00 
##                     monthaug                     monthdec 
##                 6.706119e-01                 3.718323e+00 
##                     monthjul                     monthjun 
##                 1.106896e+00                 1.178965e+00 
##                     monthmar                     monthmay 
##                 1.334589e+00                 3.239509e-01 
##                     monthnov                     monthoct 
##                 4.200598e-01                 9.103909e-01 
##                     monthsep          poutcomenonexistent 
##                 1.998782e+00                 1.544962e+00 
##              poutcomesuccess                 emp.var.rate 
##                 6.987491e+00                 9.493502e-01 
##               cons.price.idx                  nr.employed 
##                 9.464297e-01                 7.640763e-01
## Waiting for profiling to be done...
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning in regularize.values(x, y, ties, missing(ties)): collapsing to unique
## 'x' values
##                                        OR        2.5 %     97.5 %
## (Intercept)                  1.888976e+01 1.070782e+01 33.6505670
## educationbasic.6y            1.218015e+00 7.116020e-01  2.0728255
## educationbasic.9y            1.185541e+00 7.864294e-01  1.7914535
## educationhigh.school         1.283016e+00 8.813852e-01  1.8749398
## educationilliterate          1.535204e+05 2.800916e-24         NA
## educationprofessional.course 1.113933e+00 7.351269e-01  1.6908780
## educationuniversity.degree   1.646304e+00 1.147383e+00  2.3724508
## educationunknown             1.360848e+00 7.968346e-01  2.3252260
## monthaug                     6.706119e-01 4.218880e-01  1.0617838
## monthdec                     3.718323e+00 7.149775e-01 68.4416690
## monthjul                     1.106896e+00 7.017134e-01  1.7395109
## monthjun                     1.178965e+00 7.271880e-01  1.9104463
## monthmar                     1.334589e+00 6.601269e-01  2.8962979
## monthmay                     3.239509e-01 2.216778e-01  0.4683999
## monthnov                     4.200598e-01 2.618415e-01  0.6747519
## monthoct                     9.103909e-01 4.756954e-01  1.8067027
## monthsep                     1.998782e+00 6.674132e-01  8.6480245
## poutcomenonexistent          1.544962e+00 1.133581e+00  2.1086789
## poutcomesuccess              6.987491e+00 3.865071e+00 13.4717836
## emp.var.rate                 9.493502e-01 8.970638e-01  1.0061900
## cons.price.idx               9.464297e-01 9.263164e-01  0.9671140
## nr.employed                  7.640763e-01 7.240384e-01  0.8048206

Cut off selection

The cut off selection was manually iterated through, referencing the ROC curve to determine the best cutoff.

##   Model  Accuracy Sensitivity Specificity   Average Cutoff
## 1  Step 0.8740985    0.877764   0.8366108 0.8628244    0.4
##      Model  Accuracy Sensitivity Specificity   Average Cutoff
## 1 Original 0.8477488   0.8723916   0.5957201 0.7719535    0.4
## 2     Step 0.8740985   0.8777640   0.8366108 0.8628244    0.4
## 3  Forward 0.8464094   0.8708929   0.5960093 0.7711039    0.4
## 4 Backward 0.8448382   0.8691964   0.5957201 0.7699182    0.4
## 5    Lasso 0.8842211   0.9229486   0.4881434 0.7651044    0.4

Objective 2: Complex Model


The goal of model 2 is to increase predictability despite the loss of interpretability.

Initial Custom Model Testing

Final Custom Model Testing

##              Model  Accuracy Sensitivity Specificity  Average Cutoff
## 1 evr*cpi + cpi*em 0.8525139   0.8771136   0.6009254 0.776851    0.6
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
##                 poutcome                    month              nr.employed 
##                 4.442001                 5.535462                32.676342 
##             emp.var.rate           cons.price.idx                euribor3m 
##                90.833566                 5.175381                 9.946142 
##                 duration           month:duration nr.employed:emp.var.rate 
##                21.213439                 6.659970               208.299506 
##        poutcome:duration 
##                 7.540782

## [1] 0.8693592
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -5.4878 -3.7967 -2.4126 -1.7493 -0.3549 42.0157
##                Model  Accuracy Sensitivity Specificity   Average Cutoff
## 1  Complex Log Model 0.8494746   0.8448510   0.8967611        NA   0.00
## 2  Complex Log Model 0.8527200   0.8490358   0.8903991        NA   0.05
## 3  Complex Log Model 0.8553472   0.8528813   0.8805668 0.8629318   0.10
## 4  Complex Log Model 0.8588760   0.8572923   0.8750723 0.8629318   0.15
## 5  Complex Log Model 0.8621986   0.8613923   0.8704453 0.8629318   0.20
## 6  Complex Log Model 0.8654183   0.8656337   0.8632157 0.8629318   0.25
## 7  Complex Log Model 0.8693592   0.8704123   0.8585888 0.8629318   0.30
## 8  Complex Log Model 0.8725015   0.8747102   0.8499132 0.8629318   0.35
## 9  Complex Log Model 0.8754379   0.8785840   0.8432620 0.8629318   0.40
## 10 Complex Log Model 0.8778075   0.8820336   0.8345865 0.8629318   0.45
## 11 Complex Log Model 0.8802287   0.8852853   0.8285136 0.8629318   0.50
## 12 Complex Log Model 0.8825211   0.8886501   0.8198381 0.8629318   0.55
## 13 Complex Log Model 0.8846332   0.8917039   0.8123193 0.8629318   0.60
## 14 Complex Log Model 0.8869771   0.8950687   0.8042221 0.8629318   0.65
## 15 Complex Log Model 0.8889347   0.8979811   0.7964141 0.8629318   0.70
## 16 Complex Log Model 0.8909180   0.9011197   0.7865818 0.8629318   0.75
## 17 Complex Log Model 0.8931589   0.9044845   0.7773279 0.8629318   0.80
## 18 Complex Log Model 0.8949619   0.9074818   0.7669173 0.8629318   0.85
## 19 Complex Log Model 0.8965331   0.9106486   0.7521689 0.8629318   0.90
## 20 Complex Log Model 0.8981558   0.9132500   0.7437825 0.8629318   0.95
## 21 Complex Log Model 0.8992633   0.9155686   0.7325043 0.8629318   1.00

EDA


PCA

Setting up test/training for just the continuous variables to look at PCA

PCA2 vs PCA4 has a clear seperation, but the ones before don’t seem to have such a clear seperation.

LDA

Create another competing model using just the continuous predictors and use LDA or QDA

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    no   yes
##        no  19528    57
##        yes 15838  3401
##                                           
##                Accuracy : 0.5906          
##                  95% CI : (0.5857, 0.5955)
##     No Information Rate : 0.9109          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.1751          
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.5522          
##             Specificity : 0.9835          
##          Pos Pred Value : 0.9971          
##          Neg Pred Value : 0.1768          
##              Prevalence : 0.9109          
##          Detection Rate : 0.5030          
##    Detection Prevalence : 0.5045          
##       Balanced Accuracy : 0.7678          
##                                           
##        'Positive' Class : no              
## 

LDA w/ PCA variables

Running an LDA on the PCA variables

Heatmap

## [1] 23
## [1] 23

Additional Models

Regression Tree Model

## 
## Classification tree:
## tree(formula = y ~ ., data = tree.data)
## Variables actually used in tree construction:
## [1] "nr.employed" "pdays"       "month"      
## Number of terminal nodes:  4 
## Residual mean deviance:  0.5694 = 23450 / 41180 
## Misclassification error rate: 0.1005 = 4140 / 41188

## 
## Classification tree:
## tree(formula = y ~ ., data = tree.data.train)
## Variables actually used in tree construction:
## [1] "euribor3m"   "nr.employed"
## Number of terminal nodes:  4 
## Residual mean deviance:  1.099 = 2594 / 2360 
## Misclassification error rate: 0.2563 = 606 / 2364

## Warning in prune.tree(tree.bank, best = 5): best is bigger than tree size

## 
## Classification tree:
## tree(formula = y ~ nr.employed + pdays + month + cons.price.idx + 
##     campaign + contact + education + age, data = tree.data, minsize = 5)
## Variables actually used in tree construction:
## [1] "nr.employed" "pdays"       "month"      
## Number of terminal nodes:  4 
## Residual mean deviance:  0.5694 = 23450 / 41180 
## Misclassification error rate: 0.1005 = 4140 / 41188

Random Forest

##    no   yes 
## 30192  8632
##         
## fit.pred    no   yes
##      no  28968  1224
##      yes  6398  2234

## [1] "Confusion matrix for LRF"
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    no   yes
##        no  11210  2524
##        yes 24156   934
##                                           
##                Accuracy : 0.3128          
##                  95% CI : (0.3082, 0.3174)
##     No Information Rate : 0.9109          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : -0.108          
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.31697         
##             Specificity : 0.27010         
##          Pos Pred Value : 0.81622         
##          Neg Pred Value : 0.03723         
##              Prevalence : 0.91093         
##          Detection Rate : 0.28874         
##    Detection Prevalence : 0.35375         
##       Balanced Accuracy : 0.29353         
##                                           
##        'Positive' Class : no              
## 
## [1] "Overall accuracy for RF "
## [1] 0.3127962
##    no   yes 
## 35366  3458